{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m7hbib3bSGO9"
      },
      "source": [
        "**Copyright 2020 The TensorFlow Authors.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "mEE8NFIMSGO-"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SyiSRgdtSGPC"
      },
      "source": [
        "# Weight clustering in Keras example"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kW3os956SGPD"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/model_optimization/guide/clustering/clustering_example\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/g3doc/guide/clustering/clustering_example.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/g3doc/guide/clustering/clustering_example.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/model-optimization/tensorflow_model_optimization/g3doc/guide/clustering/clustering_example.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dKnJyAaASGPD"
      },
      "source": [
        "## Overview\n",
        "\n",
        "Welcome to the end-to-end example for *weight clustering*, part of the TensorFlow Model Optimization Toolkit.\n",
        "\n",
        "### Other pages\n",
        "\n",
        "For an introduction to what weight clustering is and to determine if you should use it (including what's supported), see the [overview page](https://www.tensorflow.org/model_optimization/guide/clustering).\n",
        "\n",
        "To quickly find the APIs you need for your use case (beyond fully clustering a model with 16 clusters), see the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/clustering/clustering_comprehensive_guide).\n",
        "\n",
        "### Contents\n",
        "\n",
        "In the tutorial, you will:\n",
        "\n",
        "1. Train a `keras` model for the MNIST dataset from scratch.\n",
        "2. Fine-tune the model by applying the weight clustering API and see the accuracy.\n",
        "3. Create a 6x smaller TF and TFLite models from clustering.\n",
        "4. Create a 8x smaller TFLite model from combining weight clustering and post-training quantization.\n",
        "5. See the persistence of accuracy from TF to TFLite."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RgcQznnZSGPE"
      },
      "source": [
        "## Setup\n",
        "\n",
        "You can run this Jupyter Notebook in your local [virtualenv](https://www.tensorflow.org/install/pip?lang=python3#2.-create-a-virtual-environment-recommended) or [colab](https://colab.sandbox.google.com/). For details of setting up dependencies, please refer to the [installation guide](https://www.tensorflow.org/model_optimization/guide/install). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3asgXMqnSGPE"
      },
      "outputs": [],
      "source": [
        "! pip install -q tensorflow-model-optimization"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gL6JiLXkSGPI"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "from tensorflow_model_optimization.python.core.keras.compat import keras\n",
        "\n",
        "import numpy as np\n",
        "import tempfile\n",
        "import zipfile\n",
        "import os"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dKzOfl5FSGPL"
      },
      "source": [
        "## Train a keras model for MNIST without clustering"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "w7Fd6jZ7SGPL"
      },
      "outputs": [],
      "source": [
        "# Load MNIST dataset\n",
        "(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()\n",
        "\n",
        "# Normalize the input image so that each pixel value is between 0 to 1.\n",
        "train_images = train_images / 255.0\n",
        "test_images  = test_images / 255.0\n",
        "\n",
        "# Define the model architecture.\n",
        "model = keras.Sequential([\n",
        "    keras.layers.InputLayer(input_shape=(28, 28)),\n",
        "    keras.layers.Reshape(target_shape=(28, 28, 1)),\n",
        "    keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),\n",
        "    keras.layers.MaxPooling2D(pool_size=(2, 2)),\n",
        "    keras.layers.Flatten(),\n",
        "    keras.layers.Dense(10)\n",
        "])\n",
        "\n",
        "# Train the digit classification model\n",
        "model.compile(optimizer='adam',\n",
        "              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "              metrics=['accuracy'])\n",
        "\n",
        "model.fit(\n",
        "    train_images,\n",
        "    train_labels,\n",
        "    validation_split=0.1,\n",
        "    epochs=10\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rBOQ8MeESGPO"
      },
      "source": [
        "### Evaluate the baseline model and save it for later usage"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HYulekocSGPP"
      },
      "outputs": [],
      "source": [
        "_, baseline_model_accuracy = model.evaluate(\n",
        "    test_images, test_labels, verbose=0)\n",
        "\n",
        "print('Baseline test accuracy:', baseline_model_accuracy)\n",
        "\n",
        "_, keras_file = tempfile.mkstemp('.h5')\n",
        "print('Saving model to: ', keras_file)\n",
        "keras.models.save_model(model, keras_file, include_optimizer=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cWPgcnjKSGPR"
      },
      "source": [
        "## Fine-tune the pre-trained model with clustering"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y2wKK7w9SGPS"
      },
      "source": [
        "Apply the `cluster_weights()` API to a whole pre-trained model to demonstrate its effectiveness in reducing the model size after applying zip while keeping decent accuracy. For how best to balance the accuracy and compression rate for your use case, please refer to the per layer example in the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/clustering/clustering_comprehensive_guide).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ea40z522SGPT"
      },
      "source": [
        "### Define the model and apply the clustering API"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7aOB5vjOZMTS"
      },
      "source": [
        "Before you pass the model to the clustering API, make sure it is trained and shows some acceptable accuracy."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OzqKKt0mSGPT"
      },
      "outputs": [],
      "source": [
        "import tensorflow_model_optimization as tfmot\n",
        "\n",
        "cluster_weights = tfmot.clustering.keras.cluster_weights\n",
        "CentroidInitialization = tfmot.clustering.keras.CentroidInitialization\n",
        "\n",
        "clustering_params = {\n",
        "  'number_of_clusters': 16,\n",
        "  'cluster_centroids_init': CentroidInitialization.LINEAR\n",
        "}\n",
        "\n",
        "# Cluster a whole model\n",
        "clustered_model = cluster_weights(model, **clustering_params)\n",
        "\n",
        "# Use smaller learning rate for fine-tuning clustered model\n",
        "opt = keras.optimizers.Adam(learning_rate=1e-5)\n",
        "\n",
        "clustered_model.compile(\n",
        "  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "  optimizer=opt,\n",
        "  metrics=['accuracy'])\n",
        "\n",
        "clustered_model.summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ev4MyClmSGPW"
      },
      "source": [
        "### Fine-tune the model and evaluate the accuracy against baseline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vQoy9CcASGPX"
      },
      "source": [
        "Fine-tune the model with clustering for 1 epoch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jn29-coXSGPX"
      },
      "outputs": [],
      "source": [
        "# Fine-tune model\n",
        "clustered_model.fit(\n",
        "  train_images,\n",
        "  train_labels,\n",
        "  batch_size=500,\n",
        "  epochs=1,\n",
        "  validation_split=0.1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dvaZKoxtTORx"
      },
      "source": [
        "For this example, there is minimal loss in test accuracy after clustering, compared to the baseline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bE7MxpWLTaQ1"
      },
      "outputs": [],
      "source": [
        "_, clustered_model_accuracy = clustered_model.evaluate(\n",
        "  test_images, test_labels, verbose=0)\n",
        "\n",
        "print('Baseline test accuracy:', baseline_model_accuracy)\n",
        "print('Clustered test accuracy:', clustered_model_accuracy)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VXfPMa6ISGPd"
      },
      "source": [
        "## Create **6x** smaller models from clustering"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1zr_QIhcUeuC"
      },
      "source": [
        "Both `strip_clustering` and applying a standard compression algorithm (e.g. via gzip) are necessary to see the compression benefits of clustering. \n",
        "\n",
        "First, create a compressible model for TensorFlow. Here, `strip_clustering` removes all variables (e.g. `tf.Variable` for storing the cluster centroids and the indices) that clustering only needs during training, which would otherwise add to model size during inference."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4h6tSvMzSGPd"
      },
      "outputs": [],
      "source": [
        "final_model = tfmot.clustering.keras.strip_clustering(clustered_model)\n",
        "\n",
        "_, clustered_keras_file = tempfile.mkstemp('.h5')\n",
        "print('Saving clustered model to: ', clustered_keras_file)\n",
        "keras.models.save_model(final_model, clustered_keras_file, \n",
        "                           include_optimizer=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jZcotzPSVBtu"
      },
      "source": [
        "Then, create compressible models for TFLite. You can convert the clustered model to a format that's runnable on your targeted backend. TensorFlow Lite is an example you can use to deploy to mobile devices."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "v2N47QW6SGPh"
      },
      "outputs": [],
      "source": [
        "clustered_tflite_file = '/tmp/clustered_mnist.tflite'\n",
        "converter = tf.lite.TFLiteConverter.from_keras_model(final_model)\n",
        "tflite_clustered_model = converter.convert()\n",
        "with open(clustered_tflite_file, 'wb') as f:\n",
        "  f.write(tflite_clustered_model)\n",
        "print('Saved clustered TFLite model to:', clustered_tflite_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S7amG_9XV-w9"
      },
      "source": [
        "Define a helper function to actually compress the models via gzip and measure the zipped size."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1XJ4QBMpW5JB"
      },
      "outputs": [],
      "source": [
        "def get_gzipped_model_size(file):\n",
        "  # It returns the size of the gzipped model in bytes.\n",
        "  import os\n",
        "  import zipfile\n",
        "\n",
        "  _, zipped_file = tempfile.mkstemp('.zip')\n",
        "  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:\n",
        "    f.write(file)\n",
        "\n",
        "  return os.path.getsize(zipped_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "INeAOWRBSGPj"
      },
      "source": [
        "Compare and see that the models are **6x** smaller from clustering"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SG1MgZCeSGPk"
      },
      "outputs": [],
      "source": [
        "print(\"Size of gzipped baseline Keras model: %.2f bytes\" % (get_gzipped_model_size(keras_file)))\n",
        "print(\"Size of gzipped clustered Keras model: %.2f bytes\" % (get_gzipped_model_size(clustered_keras_file)))\n",
        "print(\"Size of gzipped clustered TFlite model: %.2f bytes\" % (get_gzipped_model_size(clustered_tflite_file)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5TOgpEGfSGPn"
      },
      "source": [
        "## Create an **8x** smaller TFLite model from combining weight clustering and post-training quantization"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BQb50aC3SGPn"
      },
      "source": [
        "You can apply post-training quantization to the clustered model for additional benefits."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XyHC8euLSGPo"
      },
      "outputs": [],
      "source": [
        "converter = tf.lite.TFLiteConverter.from_keras_model(final_model)\n",
        "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "tflite_quant_model = converter.convert()\n",
        "\n",
        "_, quantized_and_clustered_tflite_file = tempfile.mkstemp('.tflite')\n",
        "\n",
        "with open(quantized_and_clustered_tflite_file, 'wb') as f:\n",
        "  f.write(tflite_quant_model)\n",
        "\n",
        "print('Saved quantized and clustered TFLite model to:', quantized_and_clustered_tflite_file)\n",
        "print(\"Size of gzipped baseline Keras model: %.2f bytes\" % (get_gzipped_model_size(keras_file)))\n",
        "print(\"Size of gzipped clustered and quantized TFlite model: %.2f bytes\" % (get_gzipped_model_size(quantized_and_clustered_tflite_file)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "U-yBcocGSGPv"
      },
      "source": [
        "## See the persistence of accuracy from TF to TFLite"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jh_pcf0XSGPv"
      },
      "source": [
        "Define a helper function to evaluate the TFLite model on the test dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EJ9B7pRISGPw"
      },
      "outputs": [],
      "source": [
        "def eval_model(interpreter):\n",
        "  input_index = interpreter.get_input_details()[0][\"index\"]\n",
        "  output_index = interpreter.get_output_details()[0][\"index\"]\n",
        "\n",
        "  # Run predictions on every image in the \"test\" dataset.\n",
        "  prediction_digits = []\n",
        "  for i, test_image in enumerate(test_images):\n",
        "    if i % 1000 == 0:\n",
        "      print('Evaluated on {n} results so far.'.format(n=i))\n",
        "    # Pre-processing: add batch dimension and convert to float32 to match with\n",
        "    # the model's input data format.\n",
        "    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)\n",
        "    interpreter.set_tensor(input_index, test_image)\n",
        "\n",
        "    # Run inference.\n",
        "    interpreter.invoke()\n",
        "\n",
        "    # Post-processing: remove batch dimension and find the digit with highest\n",
        "    # probability.\n",
        "    output = interpreter.tensor(output_index)\n",
        "    digit = np.argmax(output()[0])\n",
        "    prediction_digits.append(digit)\n",
        "\n",
        "  print('\\n')\n",
        "  # Compare prediction results with ground truth labels to calculate accuracy.\n",
        "  prediction_digits = np.array(prediction_digits)\n",
        "  accuracy = (prediction_digits == test_labels).mean()\n",
        "  return accuracy"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0swuxbPmSGPy"
      },
      "source": [
        "You evaluate the model, which has been clustered and quantized, and then see the accuracy from TensorFlow persists to the TFLite backend."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "RFD4LXjpSGPz"
      },
      "outputs": [],
      "source": [
        "interpreter = tf.lite.Interpreter(model_content=tflite_quant_model)\n",
        "interpreter.allocate_tensors()\n",
        "\n",
        "test_accuracy = eval_model(interpreter)\n",
        "\n",
        "print('Clustered and quantized TFLite test_accuracy:', test_accuracy)\n",
        "print('Clustered TF test accuracy:', clustered_model_accuracy)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JgXTEXC7SGP1"
      },
      "source": [
        "## Conclusion"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7JhbpowqSGP1"
      },
      "source": [
        "In this tutorial, you saw how to create clustered models with the TensorFlow Model Optimization Toolkit API. More specifically, you've been through an end-to-end example for creating an 8x smaller model for MNIST with minimal accuracy difference. We encourage you to try this new capability, which can be particularly important for deployment in resource-constrained environments.\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "clustering_example.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
